MODERATE DEVIATIONS IN A RANDOM GRAPH AND FOR THE 
SPECTRUM OF BERNOULLI RANDOM MATRICES 



[ng§, Peter Eichelsbacher§ 



Abstract: We prove a moderate deviation principle for subgraph count statistics of Erdos- 
Renyi random graphs. This is equivalent in showing a moderate deviation principle for the 
trace of a power of a Bernoulli random matrix. It is done via an estimation of the log- 
Laplace transform and the Gartner-Ellis theorem. We obtain upper bounds on the upper 
tail probabilities of the number of occurrences of small subgraphs. The method of proof is 
used to show supplemental moderate deviation principles for a class of symmetric statistics, 
including non-degenerate U-statistics with independent or Markovian entries. 



1. Introduction 

1.1. Subgraph-count statistics. Consider an Erdos-Renyi random graph with n vertices, 
where for all ( ™ ) different pairs of vertices the existence of an edge is decided by an indepen- 
dent Bernoulli experiment with probability p. For each i £ {1, . . . , ( ™) }, let be the random 
variable determining if the edge is present, i.e. P{X i = 1) = 1 — P(Xi = 0) = p(n) =: p. 
The following statistic counts the number of subgraphs isomorphic to a fixed graph G with 
k edges and I vertices 

(k 
f[x K 
i=l 

Here (e Kl , . . . , e Kk ) denotes the graph with edges e K1 , . . . , e Kk present and A ~ G denotes the 
fact that the subgraph A of the complete graph is isomorphic to G. We assume G to be a 



1 Ruhr-Univcrsitat Bochum, Fakultat fur Mathematik, NA 3/67, D-44780 Bochum, Germany, 

hanna. doering@ruhr-uni-bochum.de 

2 Ruhr-Universitat Bochum, Fakultat fur Mathematik, NA 3/68, D-44780 Bochum, Germany, 

peter . eichelsbacher@ruhr-uni-bochum . de 



2 



HANNA DORING, PETER EICHELSBACHER 



graph without isolated vertices and to consist of I > 3 vertices and k > 2 edges. Let the 
constant a := aut(G) denote the order of the automorphism group of G. The number of 
copies of G in K n , the complete graph with n vertices and (™) edges, is given by (") H/a 
and the expectation of W is equal to 

(?)« 



E[W] 



-p k = 0{n l p 



It is easy to see that P(W > 0) = o(l) if p <CI n~ l l k . Moreover, for the graph property that 
G is a subgraph, the probability that a random graph possesses it jumps from to 1 at the 
threshold probability n~ 1 ^ m ^ G \ where 



m(G) 



max 



— : H C G, v H > 

VH 



en, Vh denote the number of edges and vertices of H C G, respectively, see jJLROO ]. 

Limiting Poisson and normal distributions for subgraph counts were stu died for probability 
functions p = p{n). For G be an arbitrary graph, Rucihski proved in |ruc88 | that W is 
Poisson convergent if and only if 



or 



0(G) n^oo „ 

'p — > 



Here d(G) denotes the density of the graph G and 



(3(G) :-- 



max 



v G -v H 



H c G 



{e G -e H 



Consider 



and 



Z :-- 



W-E(W) 



n 
I - 

E 



a 



l<Kl<---<Kfe< 



p(l - p)p k 1 



p 



Z has asymptotic st andard normal distribution, if np 
Nowicki, Wierman 
proved in 



1 n -^? oo and n 2 (l 



Ruc^ 



p) 



'IX 



1.2] 



oo, see 



NW88J. For G be an arbitrary graph with at least one edge, Rucihski 



if and only if 



that — , E(,W ^ converges in distribution to a standard normal distribution 



np m(G) ™ ^ 



and n (1 — p) — > oo . (1.3) 

Here and in the following V denotes the variance of the corresponding random variable. 
Rucihski closed the book proving asymptotic normality in applyin g the method of moments. 
One may wonder about the normalization (11.11) used in NW88J. The subgraph count W 
is a sum of dependent random variables, for which the exact calculation of the variance is 
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tedious. In [NW88j | . the authors approximated W by a projection of W, which is a sum of 
independent random variables. For this sum the variance calculation is elementary, proving 
the denominator (11. ip in the definition of Z. The asym ptotic behaviour of the variance of W 
for any p = p(n) is summa rized in Section 2 in Ruc88| . The method of martingale differences 
used by Catoni in Cat03 | enables on the conditions np 3 ( k ~^ n — — ► oo and n 2 (l — p) n — — ► oo 
to give an alternative proof of the central limit theorem, see remark 14.21 

A common feature is to prove large and moderate deviations, namely, the asymptotic 
computation of small probabilities on an exponential scale. Considering the moderate scale 
is the interest in the transition from a result of convergence in distribution like a central 
limit theorem-scaling to the large deviations scaling. Interesting enough proving that the 
subgraph count random variable W satisfies a large or a moderate deviation principle is an 
unsolved problem up to now. The main goal of this paper is to prove a moderate deviation 
principle for the rescaled Z, filling a substantial gap in the literature on asymptotic subgraph 
count distributions, see Theorem 1 1.1 1 Before we recall the definition of a moderate deviation 
principle and state our result, let us remark, that exponentially small probabilities have been 
studied extens ively in the literature. A famous upper bound for lower tails was proven by 
Janson [Jan90l |. applying the FKG-inequality. This inequality leads to good upper bounds 
'or the probability of nonex istence W = 0. Upper bounds for upper tails were d erived b y Vu 



VuOlj , Kim and Vu KV04J ] and recently by Janson, Oleskiewicz and Rucihski JOR04j and 



in [JRM | by Janson and Rucihski. A compa rison of seven techniques proving bounds for the 
infamous upper tail can be found in JR02J. In Theorem 11.31 we also obtain upper bounds 
on the upper tail probabilities of W. 

Let us recall the definition of a large deviation principle (LDP). A sequence of probability 
measures {(fi n ),n £ N} on a topological space X equipped with a a-field B is said to satisfy 
the LDP with speed s n y oo and good rate function /(■) if the level sets {x : I(x) < a} are 
compact for all a £ [0, oo) and for all T £ B the lower bound 



and the upper bound 



liminf — log/i n (r) > — inf I(x) 

n^oo s n zSint(r) 



limsup — logyU n (r) < — inf I(x) 



hold. Here int(r) and cl(r) denote the interior and closure of T respectively. We say a 
sequence of random variables satisfies the LDP when the sequence of measures induced by 
these variables satisfies the LDP. Formally a moderate deviation principle is nothing else 
but the LDP. However, we will speak about a moderate deviation principle (MDP) for a 
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sequence of random variables, whenever the scaling of the corresponding random variables 
is between that of an ordinary Law of Large Numbers and that of a Central Limit Theorem. 

In the following, we state one of our main results, a moderate deviation principle for the 
rescaled subgraph count statistic W when p is fixed, and when the sequence p(n) converges 
to or 1 sufficiently slowly. 

Theorem 1.1. Let G be a fixed graph without isolated vertices, consisting of k > 2 edges 
and I > 3 vertices. The sequence {f3 n ) n is assumed to be increasing with 

ra'-y-Vpa-p) < Pn < n l (p*-Vp(l-p)) 4 . (1.4) 
Then the sequence (S n ) n of subgraph count statistics 

k 

k 



V 



l<«l<-<Kfc<(^J 

satisfies a moderate deviation principle with speed 

_( 2 -t(l-2)\) 2 ft_ 1 



c 2 



(r 2 2 ) (d^-^-p) 

and rate function I defined by 



ft (1-5) 



Ux) = . (1.6) 

2(f(/-2)!) 2 

Remarks 1.2. (1) Using ("~ 2 ) 2 (") < n 2 ^, we obtain s n > ( Pn . \ ; 

\l A J \Z/ \n l - 1 p k - 1 y/p(l-p) I 



therefore the condition 



n 



! -y-Vp(i-p)«^ 



implies that s n is growing to infinity as n — > oo and hence is a speed. 
(2) If we choose (3 n such that f3 n -C ^p fe_1 y/p(l — pyj and using the fact that s n is a 
speed implies that 

nV fc - 3 (l-p) 3 ™oo. (1.7) 
This is a necessary but not a sufficient condition on (11.41) . 

The approach to prove Theorem 11.11 yields additionally to a central limit theorem for 
Z = ( 7 , see remark I4T21 and to a concentration inequality for W — KW: 
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Theorem 1.3. Let G be a fixed graph without isolated vertices, consisting of k > 2 edges 
and I > 3 vertices and let W be the number of copies of G. Then for every e > 



P(W -EW > eEW) < exp 



const. e 2 n 2l p 2k 



n 2i-2p2k-i(\ _ _|_ const. en 2l ~ 2 p 1 ~ k (l — p)^ 1 
where const, are only depending on I and k. 

We will give a proof of Theorem 11.11 and Theorem 11.31 in the end of section HI 

Remark 1.4. Let us consider the example of counting triangles: I — k — 3, a — 6. The 
necessary condition (jl.7p of the moderate deviation principle turns to 

n 2 p 15 — ► oo and n 2 {l — p) 3 — ► oo as n — > oo . 

This can be compared to the expecte dly weaker necessary and sufficient condition for the 
central limit theorem for Z in Ruc88|: 

np — > oo and n 2 (l — p) — > oo as n — >• oo. 

The concentration inequality in Theorem 11.31 for triangles turns to 

P(W -EW> eEW) < exp ( const.e 2 n e p<> _\ ^ > Q 

\ n 4 p°(l — p) + const. en^p z [l — p) 1 J 

Kim and Vu showed in KV04l | for all < e < 0.1 and for p > - logn, that 

P (t?:> 1 )< e -«. 

\ ep A n A J 

As we will see in the proof of Theorem [L3J the bound for d(n) in (11.121) leads to an additional 
term of order n 2 p 8 . Hence in general our bounds are not optimal. Optimal bounds were 
obtained o nly for some subgraphs. Our concentration inequality can be compared with the 
bounds in 



JR02J, which we leave to the reader. 



1.2. Bernoulli random matrices. Theorem 11.11 can be reformulated as a moderate devi- 
ation principle for traces of a power of a Bernoulli random matrix. 

Theorem 1.5. Let X = (Xij)ij be a symmetric n x n-matrix of independent real-valued 
random variables, Bernoulli- distributed with probability 

P(X t , = 1) = 1 - P{X ij = 0) = p(n), % < j 

and P(Xu = 0) = 1, i — 1, . . . , n. Consider for any fixed k > 3 the trace of the matrix to 
the power k 

n 

Tr(X k ) = 2^ X ili2 X i2i3 ■ ■ ■ X ikil . (1.8) 
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Note that Tr(X k ) = 2 W , for W counting circles of length k in a random graph. We obtain 
that the sequence (T n ) n with 

Tr(X k ) - E[Tr(X k )] 



-L 71. • 



1.9) 



satisfies a moderate deviation principle for any f3 n satisfying (jl.4p with I = k and with rate 
function (11.61) with I = k and a = 2k: 

2 

J( x ) = - (1.10) 

REMARK 1.6. The following is a famous open problem in random matrix theory: Consider 
X to be a symmetric n x n matrix with entries X^ [i < j) being i.i.d., satisfying some 
exponential integrability. The question is to prove for any fixed k > 3 a LDP for 

1 



: Ti{X k ) 



and the MDP for 



1 



P n (k) 



(Tr{X k ) - E[Tr(X k )}) 



;i.n) 



for a properly chosen sequen ce n ( k). For k = 1 the LDP in question immediately follows 
from Cramer's theorem (see DZ98I . Theorem 2.2.3]), since 



1 - 

-Tr(I) = -Vl !r 

n n L — ' 

i=i 



For k = 2, notice that 



i<j i=l 

By Cramer's theorem we know that (A n ) n with A n := 7hv X)i<7 satisfies the LDP, and 



by Chebychev's inequality we obtain for any e > 



limsup — logP(|.B n | > e) = — oo. 

n^oo 

Hence (A n ) n and (^5-Tr(X 2 )) n are exponentially equivalent (see DZ98I . Definition 4.2.10]). 
Moreover (A n ) n and (A n ) n are exponentially equivalent, since Chebychev's inequality leads 
to 



limsup - log P(\ An - A n \ > e) 

n— >oo 71 



lim sup — log P ( | X 2 - 1 > e 

n—>oo Tl V . . ■ 



n 2 (n — 1) 



— oo. 



Applying Theorem 4.2.13 in 



DZ98 



we obtain the LDP for (l/n 2 Tr(X 2 )) n under exponential 
integrability. For k > 3, proving the LDP for (l/n k Tr(X k )) n is open, even in the Bernoulli 
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case. For Gaussian entries with mean and variance 1/n, the LDP for the sequence of 
empirical measures of the corresponding eigenvalues Ai, . . . , A n , e.g. 



n 

n 

i=i 



has been established by Ben Arous and Guionnet in BAG97j . Although one has the repre- 
sentation 



n k ^ ^ n k / 2 ( \fn J n k l 2 * ' 
x ' i=i 



the LDP canno t be deduced from the LDP of the empirical measure by the contraction 
principle DZ98I . Theorem 4.2.1], because x — > x k is not bounded in this case. 

Remark 1.7. Theorem 11.51 told us that in the case of Bernoulli random variables Xij, the 
MDP for ffTTTTD holds for any k > 3. For k = 1 and k = 2, the MDP for flTTTj) holds for 
arbitrary i.i.d. entries X^ satisfying some exponential integrability: For k — 1 we choose 
n (l) ■= a n with a n any sequence with lim^oo — = and lim^oo — = oo. For 



l n 

-^(X«-E(X«)) 



i=i 



the MDP holds with rate x 2 /(2V(X u )) and speed a 2 Jn, see Theorem 3.7.1 in [DZ98| . In 
the case of Bernoulli random variables, we choose j3 n (l) = a n with (a n ) n any sequence with 

lim ^ (1 " P) =0and lim ^ = P) = oo 

n—>oo a n n—>oo a n 

and p = p{n). Now (— J^^ii^a — ~E(Xa))) n satisfies the MDP with rate function x 2 /2 and 
speed 



np{n){\ — p[n)) 

Hence, in this case p(n) has to fulfill the condition ra 2 p(?7,)(l — p(n)) — > oo. 

For k = 2, we choose 8 n (2) = a n with a n being any sequence with limn^oo — = and 

2 

lim^^oo — = oo. Applying Chebychev's inequality and exponential equivalence arguments 

0-71 



similar as in Remark 11.61 we obtain the MDP for 



a n . 
hj =1 



with rate x 2 /(2V(Xn)) and speed a 2 /n 2 .The case of Bernoulli random variables can be 
obtained in a similar way. 
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Remark 1.8. For k > 3 we obtain the MDP with (3 n = /3 n (k) such that 

n k ~ 1 p(n) k ~ l y/p(n)(l - p(n)) < (3 n < n k (p(n) k ~ 1 ^/p(n){l -p(n))) 4 . 

Considering a fixed p, the range of (3 n is what we should expect: n k ~ Y <C (3 n <C n k . But we 
also obtain the MDP for functions p(n). In random matrix theory, Wigner 1959 analysed 
Bernoulli random matrices in Nuclear Physics. Interestingly enough, a moderate deviation 
principle for the empirical mean of the eigenvalues of a random matrix is known only for 
symmetric matrice s with Gaussian entries and for non-centered Gaussian entries, respec- 
tively, see DGZ03| |. The proofs depend on the existence of an explicit formula for the joint 
distribution of the eigenvalues or on corresponding matrix-valued stochastic processes. 



1.3. Sym metric Statistics. On the way of proving Theorem ll.il we will apply a nice result 
of Catoni Cat03 



Theorem 1.1]. Doing so, we recognized, that Catoni's approach lead us to 
a general approach proving a moderate deviation principle for a rich class of statistics, which 
-without loss of general ity- can be assumed to be symmetric statistics. Let us make this 
more precise. In Cat03l ] , non-asymptotic bounds of the Zog-Laplace transform of a function 
/ of k{n) random variables X := (Xi, . . . ,X k ( n \) l ea d to concentration inequalities. These 
inequalities can be obtained for independent random variables or for Markov chains. It is 
assumed in Cat03l ] that the partial finite differences of order one and two of / are suitably 



bounded. The line of proof is a combination of a martingale difference approach and a Gibbs 
measure philosophy. 



Hi be a product 



Let (Q, A) be the product of measurable spaces &i) and P 

probability measure on (Q,A). Let X\, . . . ,Xu n ) take its values in (Q,A) and assume that 
(X±, . . . , Xk( n )) is the canonical process. Let (Yi, . . . , Y k ^) be an independent copy of X : = 
(Xi, . . . ,Xk( n )) such that Yi is distributed according to fa, i — 1, . . . , k(n). The function 
/ : SI — > M is assumed to be bounded and measurable. 

Let Aif(x k ^; yi) denote the partial difference of order one of / defined by 

Aif(x k(n) ; := f(x u . . . , x k{n) ) - f(x u . . . , Xi_ u y h x i+u . . . , x k{n) ) , 

where xf := (x±, . . . , Xk{ n )) £ ^ and yi G Analogously we define for j < % and yj G Xj 
the partial difference of order two 

A i A j f(x'l {n) ;y j ,y i ) :=A i f(x k l {n) ;y i ) - f(x x , . . . ,x j - 1 ,y j ,x j+1 , . . .,x k{n) ) 

f\p^\i ■ ■ ■ i •Ej—li Vji "Ej+li ■ ■ ■ j "^i— 1) Vii -Ei+li ■ ■ ■ i ^fe(n)) • 
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Now we can state our main theorem. If the random variables are independent and if the 
partial finite differences of the first and second order of / are suitably bounded, then /, 
properly rescaled, satisfies the MDP: 

Theorem 1.9. In the above setting assume that the random variables in X are independent. 
Define d(n) by 

d(n) : = £ lAJCXfW; y^l 2 ^/(Xf^; Y t )\ + ± £ | A^./pf^; Y jt Y t )\ . (1.12) 

Moreover let there exist two sequences (s n ) n and (t n ) n such that 

(1) -fd(n) for all u G and 

(2) %V/(X) ™ C > for the variance off. 
Then the sequence of random variables 

'f(X)-E[f(X 




/ n 

2 

satisfies a moderate deviation principle with speed s n and rate function ^ . 



In Section [2] we are going to prove Theorem 11.91 via the Gartner-Ellis theorem. In Cat03 ] 
an inequality has been proved which allows to relate the logarithm of a Laplace transform 
with the expectation and the variance of the observed random variable. Catoni proves a 
similar result for the logarithm of a Laplace tr ansform of random variables with Markovian 



dependence. One can find a different d{n) in |Cat03l . Theorem 3.1]. To simplify notations 
we did not generalize Theorem 11.91 but the proof can be adopted immediately. In Section [3] 
we obtain moderate deviations for several symmetric statistics, including the sample mean 
and [/-statistics with independent and Markovian entries. In Section |4] we proof Theorem 
[QandPl 

2. Moderate Deviations via Laplace Transforms 
Theorem 11.91 is an application of the following theorem: 



Theorem 2.1. (Catoni, 2003) 

In the setting of Theorem \1.9\, assuming that the random variables in X are independent, 
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one obtains for all s £ 



JlogEexp (sf(X)) - sE[f(X)} - -Vf(X) < s 3 d(n) 



(2.13) 



Jfc(ji) „ 



k(n) i— 1 o 
- S J 



k ( n ). v\\2\ 



8=1 i=l j=l 

Proof of Theorem 12.11 We decompose f(X) into martingale differences 

Fi{f{X)) = E[f{X) \X X , . . . , X,} - E[f(X) \Xi, X^] , for alH e {1, . . . , k(n)} . 

k(n) 

The variance can be represented by V/(X) = ^ E [(^(/(X))) 

8=1 

Catoni uses the triangle inequality and compares the two terms \ogEe s ^ x ^~ sE ^^ x ^ and 
4rV/(X) to the above representation of the variance with respect to the Gibbs measure with 
density 



,w 



dPw '■ 



E\e w ' 



dP, 



where W is a bounded measurable function of (X 1; . . . , X k ( n \). We denote an expectation 
due to this Gibbs measure by E w , e.g. 

E[Xexp(WQ] 
Ew[X] - E[exp (W)] ■ 
On the one hand Catoni bounds the difference 

2 k(n) 

logEe s/ W- sE [ / W]--VE r | I r(i^(/(X))) 2 ] 

1=1 L 1 J 

via partial integration: 

fc(ri) 

logEe-C/W-WWl) --VE r r , i[if(/(X))l 

8=1 

fc(n) ,. s 



[/(JO-KI/Wllxi,...,^-!] 



IE 

i=l 



s — a 



■ivl 3 r | -r [Fi(f(X)))da 

2 S E[/(X)-E[/(X)]|x 1 ,...,X I _ 1 J+a^ 1 (/(X)) L 



where Mjj[X] := [(X — Ef/[X]) ] for a bounded measurable function U of (Xi, . . . , X fc ( n )). 
Moreover 

k(n) 



|E 

8=1 



s — a 



si; 



[/(X)-E[/(X)] Ixi,...^-!] +aF t (f(X)) 1 
k(n) 3 



[F,(/(X))]da 



i=l J ° ' 8=1 ^ 
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On the other hand he uses the following calculation: 

2 fc(ra) 



< 



i=l L 1 J 

2 A;(n) 2 fc(fi) 

t£ e 4,«>-e W .v„|* *_j [(WW))*] - j£4(W(*))) 2 ] 

i=l L 1 J i=l 

2 i— 1 

t=l i=l 1 J 

2 Hn) i-1 - g 

^EE / V^ ! _JF J (^( / (X))) 2 ]E^_J^]^ 

t=l 7=1 



applying the Cauchy-Schwartz inequality and the notation 

E 9 *[\ : = E[.|Xi, . . .tX^Xj+i, . . .,X k[n) ] and 
W^G^i-E^JG^i], 

where G^_! = E[/(X)|X 1; . . . ,X,_J - E^' [e[/(X)|Xi, . . . ^.J 



As you can see in [Catn3| Fj(F?(f(X))) and W can be estimated in terms of Aj/(X) and 
AiAjf(X), independently of the variable of integration a. This leads to the inequality stated 
in Theorem 12.11 □ 



DZ98 



Proof of Theorem 11.91 To use the Gartner-Ellis theorem (see 
have to calculate the limit of 

1, „ f. f(X)-E[f(X)] \ 1 A „ f\s n .,„^\ 

■ 



logEexp As 



logEexp (-^f(X) 



As, 



Theorem2.3.6]) we 
^E[f(X)]) (2.14) 



for A G R. We apply Theorem 12.11 for s = — - and A > 0. The right hand side of the 
inequality (I2.13P converges to zero for large n: 



—s 3 d(n) = X 3 ^fd(n) 



(2.15) 



as assumed in condition ([T]). Applying (I2.13P this leads to the limit 



A(A) := lim 1 logEexp ( Xs J {X) zMi^A ) = lim ±^Vf(X) = ^C, (2.16) 



n— >oo g 



s 2f 2 

where the last equality follows from condition (2). A is finite and differentiable. The same 
calculation is true for — / and consequently (12. 16[) holds for all A G R. Hence we are 
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able to apply the Gartner-Ellis theorem. This proves a moderate deviation principle of 
^ f(x)-E[f(x)] j with, speed s n and rate function 



I(x) = sup <J Ax - y c } = ^ ■ 



□ 



3. Moderate Deviations for Non-degenerate U-statistics 

In this section we show three applications of Theorem 11.91 We start with the simplest 

case: 

3.1. sample mean. Let Xi, . . . , X n be independent and identically distributed random vari- 
ables with values in a compact set [— r, r], r > fix, and positive variance as well as Yi, . . . , Y n 
independent copies. To apply Theorem 11.91 for f(X) = ^ =1 I m the partial differences 
of / have to tend to zero fast enough for n to infinity: 

1 2r 
lAi/Wl Yi)\ = -=\Xi -Y\<— (3.17) 
In \ n 



AjAj/pf"; Yj, Yi) = (3.18) 

I — 2 

Let a n be a sequence with lim^oo ^ = and lim^^ ^- = oo. For t n = ^= and s n = ^ 
the conditions of Theorem 11.91 are satisfied: 

(1) -^cZ(n) < — > — p= = — . Because din) is positive this implies lim -^-din) 

tt Jn n ' 3\/n n 3 n^oo tt 

11 v m=l v " 

0. 

(2) £v/(x) = v ( j = v(xo. 

n \ * m=l / 

1 ( n \ 

The application of Theorem 11.91 proves the MDP for — X m - nEXi with speed s n 



\m=l 



a, 

and rate function I(x) = -^r- This result is well known, see for example |DZ98j . Theorem 



3.7.1, and references therein. The MDP can be pr oved u nder local exponential moment 



conditions on Xy. E(exp(AX 1 )) < 00 for a A > 0. In [Cat03|], the bounds of the log-Laplace 
transformation are obtained under exponential moment conditions. Applying this result, we 
would be able to obtain the MDP under exponential moment conditions, but this is not the 
focus of this paper. 
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3.2. non-degenerate U-statistics with independent entries. Let X\, . . . ,X n be inde- 
pendent and identical distributed random variables with values in a measurable space X . 
For a measurable and symmetric function h : X m — > K we define 



Un{h):=^- Yl h(X il ,...,X i , 

\m.J 



L<-"<tm<n 



where symmetric means invariant under all permutation of its arguments. U n (h) is called a 
U-statistic with kernel h and degree m. 
Define the conditional expectation for c = 1, . . . , m by 



h c (xx,...,x c ) := E[h(x 1 ,... : x c ,X c+1 ,... : X r) 

= E[h(X 1 ,...,X m )\Xi = x 1 ,...,X c = x e ] 

and the variances by a\ := V\h c (Xi, . . . , X c )\ . A U-statistic is called degenerate of order d 
if and only if = a\ = ■ ■ ■ = a\ < cr^ +1 and and non- degene rate if a\ > 0. 



By the Hoeffding-decomposition (see for example |Lee90| |). we know that for every sym- 
metric function h, the [/-statistic can be decomposed into a sum of degenerate [/-statistics 
of different orders. In the degenerate case the linear term of this decomposition disapp ears. 



Eichelsbacher and Schmock showed the MDP for non-degenerate [/-statistics in ES03J; the 
proof used the fact that the linear term in the Hoeffding-decomposition is leading in the 
non-degenerate case. In this article the observed U-statistic is assumed to be of the latter 
case. 

We show the MDP for appropriate scaled U-statistics without applying Hoeffding's de- 
composition. The scaled U-statistic / := y/nU n (h) with bounded kernel h and degree 2 
fulfils the inequality: 



A fc /(z? ; Vk) = —r~\\ ( Y H x ii x j)- Y h ( Xi > x ^ ~ h ( Xi > Vk ^ 

l<i<j<n i<i<j<n i=l 



j=k+l 

2 f k ~ 

* ^ ' \t=l j=k+l i=l j=k+l 



j=k+l 

' k— 1 n k—1 



' x j, 



4||/l| 
< - 



n 
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for k = l,...,n. Analogously one can write down all summations of the kernel h for 
A m Akf(xi] yk, Dm)- Most terms add up to zero and we get: 

2 (h(x k , x m ) - h(y k , x m ) - h(x k , y m ) + h(y k , y m )) 



A m A k f(x™;y k ,y v 



y/n(n 
16 



\/n{n — 1) ^ °° n 3 / 2 



Let a n be a sequence with lim^oo — = and lim^oo — = oo. The aim is the MDP for 

2 

a real random variable of the kind -^U n (h) and the speed s n := To apply Theorem 11.91 
for f(X) = \/nU n (h)(X), s n as above and t n := we obtain 

g2 q /4||/l|P 77, 1 \ 

(1) -^d(n) < I H 5Tr8||^||L )• The right hand side converges to 0, be- 



cause lim^oo a n /n = 0. 

2 

(2) -£Vf(X) = -±—V(y/nUJh)(X)) ™ 4a 2 , see Theorem 3 in Lee90l . chapter 1.3]. 
t\ n a? n 

The non-degeneracy of U n (h) implies that 4af > 0. 

The application of Theorem 11.91 proves: 

THEOREM 3.1. Let (a n ) n £ (0, oo) N be a sequence with lim^oo — = and lim^oo — = oo. 
Then the sequence of non-degenerate and centered U-statistics (-^U n (h)^J with a real-valued, 

a 2 

symmetric and bounded kernel function h satisfies the MDP with speed s n := — and good 
rate function 



I(x) = sup{Ax - 2 A a 2 } 



x 



2 



8af • 

Remark 3.2. Theorem 13.11 holds, if the kernel function h depends on i and j, e.g. the 
^/-statistic is of the form ^Ei<i<j<n' l y(^ i '^i)' One can see this m the estimation of 
Aif(X) and AiAjf(X). This is an improvement of the result in ES03 |. 

Remark 3.3. We considered U-statistics with degree 2. For degree m > 2 we get the 
following estimation for the partial differences of 

f(X):=^J-- J2 KX h ,...,X im ) : 

\m) l<ii<— <i m <n 



f Tl — 1 \ 2 TJX 

(J W-v 

1 / n-2\ 4m(m 



(m) V m - 2 / V^O 

and Theorem 11.91 can be applied as before. 
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Theorem 13.11 is proved in ES03I ] in a more general context. Eichelsbacher and Schmock 
showed a moderate deviation principle for degenerate and non-degenerate U-statistics with 
a kern e l func tion h, which is bounded or satisfies exponential moment conditions (see also 
EicQaL lEicoJ |). 



Example 1: Consider the sample variance U^, which is a U-statistic of degree 2 with 
kernel h(x\,X2) = oO^i — x 2) 2 ■ Let the random variables X iy i = l,...,n, be restricted to 
take values in a compact interval. A simple calculation shows 

a\ = V[/ii(X0] = iv[(X! - EX,) 2 } = - A (E[(Xx - EX X ) 4 ] - (NX,) 2 ) . 

The U-statistic is non-degenerate, if the condition E[(Xi — EX X ) 4 ] > (YXi) 2 is satisfied. 
Then ^ a (n-i) Y^i=\{^-i ~ -^) 2 ) satisfies the MDP with speed ^ and good rate function 



2 9 
X X 



[X 



8a 2 2 (E[(Xi - EXi) 4 ] - (VA^) 2 ) ' 

In the case of independent Bernoulli random variables with P(X\ = 1) = 1 — P{X\ = 
0) = p, < p < 1, is a non-degenerate U-statistic for p ^ | and the corresponding rate 
function is given by: 

Bernoulli (x) = 2 p(\ - p){\ - Ap{l - p)) ' 

Example 2: The sample second moment is defined by the kernel function h(x±, X2) = X\Xi- 
This leads to 

a\ = V(/ii(Xi)) = V(X 1 EX 1 ) = (EX 1 ) 2 YX 1 . 

The condition o\ > is satisfied, if the expectation and the variance of the observed random 
variables are unequal to zero. The values of the random variables have to be in a compact 
interval as in the example above. Under this conditions ^- Y2i<i<j< n X%Xj satisfies the MDP 

2 

with speed — and good rate function 

I sec (x) 



8(EXi) VXi 

For independent Bernoulli random variables the rate function for all < p < 1 is: 



pec I \ 



X 2 



bcrnoulliV~, 8 p 3 (l-p) 

Example 3: Wilcoxon one sample statistic Let Xi,...,X n be real valued, independent 
and identically distributed random variables with absolute continuous distribution function 
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symmetric in zero. We prove the MDP for -properly rescaled- 

W n = Yl W,>o} = Q) U n (h) 



l<i<j<n 



defining h(xi,X2) '■= l{ Xl +x 2 >o} for all x±,X2 G K. Under these assumptions one can calculate 
o~\ = Cav(h(Xi, X%), h(X2, X3)) = j^. Applying Theorem 13.11 as before we proved the MDP 
for the Wilcoxon one sample statistic ^ n _\^ a (W n — \ (")) with speed ^ and good rate 



function I w (x) = \x 2 . 



3.3. non-degenerate U-statistics with Markovian entries. The moderate deviation 
principle in Theorem 11.91 is stated for independent random variables. Catoni showed in 



Cat03|, that the estimation of the logarithm of the Laplace transform can be generalized for 
Markov chains via a coupled process. In the following one can see, that these results yield 
analogously to the proof of Theorem 11.91 to a moder ate dev iation principle. 



Cat03| . Chapter 3. 



In this section we use the notation introduced in 
Let us assume that (Xk)k£N is a Markov chain such that for X := (Xi,...,X n ) the 
following inequalities hold 

P(n > i + k\Qi,X % ) < Ap k VfceN a.s. (3.19) 
P(r l >i + k\F n , Yi) < Ap k VA; G N a.s. (3.20) 

i i i 

for some positive constants A and p < 1. Here Y '■= (Yi, • • • , Y n ), i = 1, ■ ■ ■ ,n, are n coupled 

i 

stochastic processes satisfying for any i that Y is equal in distr ibution to X. For the list of 



the properties of these coupled processes, see page 14 in [Cat03j. Moreover, the a-algebra Qi 

i 

in (13.191) is generated by F, the cr-algebra T n in (13.201) is generated by {X\, . . . , X n ). Finally 
the coupling stopping times r» are defined as 

i 

Ti = M{k > i\Yk = X k }. 

Now we can state our result: 

Theorem 3.4. Let us assume that (X/,)/,^ is a Markov chain such that for X := (Xi, . . . , X n ) 
(I3.19P and (13.201) hold true. Let U n (h)(X) be a non- degenerate U-statistic with bounded ker- 
nel function h and lim n ^ IX] Y[^/nU n (h)(X)^ < 00. Then for every sequence a n , where 

lim — = and lim ^- — , 

rwoo n n—>oo a„ 
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the sequence (-^U n (h)(X)} n satisfies a moderate deviation principle with speed s n = ^ and 
rate function I given by 

( \2 

I(x) : = sup <\x lim V (^/nU n (h)(X)) 



Pr oof. As for the independent case we define f{X) := y/nU n (h)(Xi, . . . ,X n ). Corollary 3.1 
of 



Cat03j | states, that in the above situation the inequality 



< 



logEexp (sf(X)) - sE[f(X)] - -Wf(X) 
s 3 BCA 3 (p\og{p- x ) s 



V^(l-P) 3 

«3 



2AB 



+ 



B 3 A 3 AB 2 A i (p\ g(p- 1 ) 



2AB 



fn I 3(1 -p) 3 (1-p) 3 

holds for some constants B and C. This is the situation of Theorem 11.91 except that in this 
case d{n) is defined by 



1 BCA 3 /plog(p _1 



n (1 — p)' c 



2AB 



-l 



B 3 A 3 AB 2 A 3 fp\og{p~ 1 



2AB 



-r 



n V 3(1 - p) 3 (1-p) 3 
This expression depends on s. We apply the adapted Theorem 11.91 for s n — ^ , t n :— ;% and 



s := \^r= as before. 



Because of ^ 
(1) §d{n) - 

n— >oo 



n 

n (1-p) 3 



0, the assumptions of Theorem II .91 are satisfied: 



2AB 



n \ 3(1- p) 



B 3 A 3 _| AB 2 A 3 ( plog Q- 1 ) 



2AB 



-1 



(2) f| V/(X) = V(v^[/ n (/i)(X)) < oo as assumed. 

Therefore we can use the Gartner-Ellis theorem to prove a moderate deviation principle for 

(£U n (h)(X)) n . □ 

Corollary 3.5. Let (Xk)k^n be a strictly stationary, aperiodic and irreducible Markov 
chain which finite state space and U n (h)(X) be a non- degenerate U-statistic based on a 
bounded kernel h of degree two. Then (—U n (h)(X)) n satisfies the MDP with speed and 
rate function as in Theorem \3-4\ 

Proof. The Markov chain is strong mixing and the absolute regula rity co efficient f3(n) con- 
verges to at least exponentially fast as n tends to infinity, see Bra05| . Theorem 3.7(c). 
Hence the equations (13.191) and (13.201) are satisfied and Theorem 13.41 can be applied. The 



18 



HANNA DORING, PETER EICHELSBACHER 



limit of the variance of y/nU n (h) is bounded, see Lee90j |. 2.4.2 Theorem 1, which proves the 
MDP for this example. □ 



For Doeblin recurrent and aperio dic Markov chains the MDP for additive functionals of a 
Markov process is proved in Wu95|. In fact Wu proves the MDP under the condition that 1 
is an isolated and simple eigenvalue of the transition probability kernel satisfying that it is 
the only eigenvalue with modulus 1. For a conti nuous s pectrum of the transition probability 
kernel Delyon, Juditsky and Lipster present in DJLOq a method for objects of the form 



1 n 1 
— Y,H{Xi-x), -<«<l,n>l, 

i=l 



where (Xj)j> is a homogeneous ergodic Markov chain and the vector-valued function H 
satisfies a Lipschitz continuity. To the best of our knowledge, we proved the first MDP for 
a [/-statistic with Markovian entries. 



4. Proof of Theorem 11.11 and 11.31 

Lemma 4.1. The standardized subgraph count statistic Z satisfies the inequalities 

AiZ < n ■ 1 (4.21) 

(4.22) 



v^Mi-py- 1 
YB E!=i aa-z < £ (2) (7-1) (* - 2) 2 (/ - 2)! 



Proof of Lemma 14.11 As the first step we will find an upper bound for 



a iZ = z--L J2 

L "n,p 



-{(e K1 ,...,e K )~C?} 



p 



l<rei<~<re fe <m V?- 1 



where (X it i, X it2 , . . . ,X i / n \) = (Xi, . . . , X^i, Y h X i+ i, . . . ,Xt n \) and Yi is an independent 
copy of Xi, i E {1, . . . , (o)}- The difference consists only of those summands which contain 
the random variable Xi or Yi. The number of subgraphs isomorphic to G and containing a 
fixed edge, is given by 

'n-2\ 2k 
1-2 )~a 

p. 307. Therefore we can estimate 

1 



■{l-2)\ 



sec 



NW8i 



AiZ < 



(4.23) 
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For the second step we have to bound the partial difference of order two of the subgraph 
count statistic. 

A.A.-Z 



k-2 

— l {(e l ,e j ,e K1 ,...,e Kk _ 2 )~G}Y[X Km X j (X l -Y i ) 



n,p ,^ , . , 



- k-2 
~ 1 {(e l ,e j ,e Kl ,...,e Kk _ 2 )r^G}Y\. X ^ Y j( X i- Y i 



l<Ki<-<K fc _ 2 m=l 

k-2 



= — l {(e 1 ,e 3 ,e K1 ,...,e Kk _ 2 )~G}YlX Km (X j -Yj)(Xi-Yi) 

n 'P l<Ki<-< Kfe _ 2 m=l 

Instead of directly bounding the random variables we first care on cancellations due to the 
indicator function. We use the information about the fixed graph G. To do this we should 
distinguish the case, whether and Cj have a common vertex. 

• ei and Cj have a common vertex: 

Because G contains I vertices, we have ("r|) possibilities to fix all vertices of the 
subgraph isomorph to G and including the edges and ej. The order of the vertices 
is important and so we have to take the factor 2{l — 2)! into account. 

• ei and Cj have four different vertices: 

Four fixed vertices allow us to choose only (?~ 4 ) more. The order of the vertices 
and the relative position of and are relevant. So as before the factor is given by 
2(/-2)!. 

Bounding the random variables X i7 Yi, i e {l, . . . , (")} by l, we achieve the following 
estimation: 

(2) i-i 

EE A A^ (4-24) 

i=i j=i 

< ^(4(n-2)( r >- 3 o ) + (n-2)(n-3)( r >- 4 ;))(l-2)\ (4.25) 



l i"n\ f n — 2 



Cn,p v 2 / \ i 2 



(/ - 2) J (/ - 2)! . (4.26) 



To bound A;AjZ for % fixed in fl4T24l) one has to observe that there are at most 2{n — 2) 
indices j < i, such that and have a common vertex, and \(n— 2){n— 3) = ( ™) — 2(n— 2) — l 
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indices j, such that and Cj have no common vertex. This proves the inequality (I4.25P and 
hence (14T26D follows. □ 



To apply Theorem 11.91 we choose s n = - - - ^ ^" and t n = 

c n,p C «,P 



2 

2 



because lim^oo VZ = 1, see [NW88J. We need Lemma |4~T1 to bound d{n): 



(;) 

(^)p 2fc -!(l -P) tr l 3^/^)^-1/2(1 _p)l/2 



i-1 



62D l ^ / l 

3=1 



™ 1 f Vg) (2) (?r 2 2 ) (Z-2) 2 (Z-2) 

- (I) p 2k ~ l (l - p) I 3p fe ~ 1 /2(i _ p y/2 + 



(Z-2) 2 a 



(4.27) 



— p 3(fe-l/2)( 1 _ p )3/2 ^3 2 A; 

And condition [2] of Theorem 11.91 follows from 

s| 1 1 (Z-2) 2 a ^ /2fc 

tr (nj -^(r 2 2 )a)p 4fe - 2 (i-^ 2 U + 2^ ' j 

, if /5 n <C n l p 4k ~ 2 (l - p) 2 as assumed. 

2 

%-G?(n) is positive and therefore the limit of n to infinity is zero, too. With Theorem 11.91 we 
proved Theorem 11.11 

Remark 4.2. The estimation by Catoni, see Theorem 12.11 and Lemma 14.11 allow us to 
give an alternative proof for the central limit theorem of the subgraph count statistic Z, if 
n p3(k-±) n _^^ ^ an( j n 2(j _ py/2 n->oa ^ faese conditions it follows, that d(n) ^— > 0, 

and it is easy to calculate the following limits: 

lim Ee xz = lim — YZ = e ^ < 00 for all A > 
and additionally lim lim Ee AZ = 1 . 



Hence the central limit t 
stronger than the one in 



leorem results from the continuity theorem. Both conditions are 



NW88 



Proof of Theorem 11.31 We apply Theorem 12.11 and the Chebychev inequality to get 

P{Z >e)< exp I -se + — VZ + s 3 d(n] 
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for all s > and all e > 0. Choosing s = | d(n)E implies 

5 2 



P(Z > e) < exp ( - — M(w)ex 1 . (4.28) 



2(VZ + «) 



Applying Theorem 12. II to — Z gives 

P(Z < -e) < exp 



5 2 



2(VZ + ^) 

Now we consider an upper bound for the upper tail P(W — EW > eEW) = P\Z > ^— )• 
Using YZ = c~JW, inequality KM leads to 

P(W — EW > e EW) < exp -\ >— — — . 

Indeed, this concentration inequality holds for f(X)—Ef(X) in Theorem 1 1 . 91 with d(n) given 
as in (11.121) . We restrict our calculations to the subgraph-counting statistic W. We will use 
the following bounds for EW, YW and c njP : there are constants, depending only on I and k, 
such that 

const. n l p k < EW < const. n l p k , 

const. n 2i "V fe_1 (l ~P)< VW < const. n 2l ~ 2 p 2k ^ (1 - p) 
2nd section]), and 

const. n'-y- 1/2 (l -p) 1/2 < c n , p < const. n'-y- 1/2 (l - p) 1/2 . 



see 



Ruc88 



Using the upper bound (I4.27P for d(n), we obtain 
P(W-EW >eEW) <exp| 



const. e 2 n 2l p 2k 



n 2i-2p2k-i^i — p) + const. en 2l ~ 2 p~ k+1 (1 — p)^ 1 / 
which proves Theorem 11.31 □ 

Acknowledgment: The first author has been supported by Studienstiftung des deutschen 
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